Fires in NYC and FDNY Response

Overview

For this assignment, we are going to investigate fires requiring the fire department to respond. Using data about the locations of firehouses and fires occurring in New York City, we want to know whether response times to fires differ across the city. Second, we will try to focus on one possible variable that could affect response times – the distance from the firehouse – and see whether we find the (expected) effect.

To keep this homework manageable, I am leaving out another part of the investigation: What is the effect of demographic and/or income characteristics of the neighborhood on response times. This is likely a bit more sensitive but also relevant from a public policy perspective.

Data

We rely on two data sets.

Incidents responded to by fire companies

NYC Open Data has data on all incidents responded to by fire companies. I have included the variable description file in the exercise folder. The following variables are available:

  • IM_INCIDENT_KEY: Unique identifier for each incident which serves
  • INCIDENT_TYPE_DESC The code and description of the incident category type
  • INCIDENT_DATE_TIME The date and time that the incident was logged into the Computer Aided Dispatch system
  • ARRIVAL_DATE_TIME The date and time that the first unit arrived on scene
  • UNITS_ONSCENE Total number of units that arrived on scene
  • LAST_UNIT_CLEARED_DATETIME The date and time that the incident was completed and the last unit cleared the scene
  • HIGHEST_LEVEL_DESC The highest alarm level that the incident received
  • TOTAL_INCIDENT_DURATION The total number of seconds from when then incident was created to when the incident was closed
  • ACTION_TAKEN1_DESC The code and description of the first action taken
  • ACTION_TAKEN2_DESC The code and description of the second action taken
  • ACTION_TAKEN3_DESC The code and description of the third action taken
  • PROPERTY_USE_DESC The code and description of the type of street or building where the incident took place
  • STREET_HIGHWAY The name of the street where the incident_took place
  • ZIP_CODE The postal zip code where the incident took place
  • BOROUGH_DESC The borough where the incident took place
  • FLOOR The floor of the building where the incident took place
  • CO_DETECTOR_PRESENT_DESC Indicator for when a CO detector was present
  • FIRE_ORIGIN_BELOW_GRADE_FLAG Indicator for when the fire originated below grade
  • STORY_FIRE_ORIGIN_COUNT Story in which the fire originated
  • FIRE_SPREAD_DESC How far the fire spread from the object of origin
  • DETECTOR_PRESENCE_DESC Indicator for when a detector was present
  • AES_PRESENCE_DESC Indicator for when an Automatic Extinguishing System is present
  • STANDPIPE_SYS_PRESENT_FLAG Indicator for when a standpipe was present in the area of origin of a fire

This dataset is only updated annually, and thus far only data from 2013 to 2018 is contained. The full dataset is also somewhat too large for an exercise (2.5M rows), so I suggest to limit yourself to a subset. I have added a file containing the subset of of only building fires (INCIDENT_TYPE_DESC == "111 - Building fire") for 2013 to 2018 only which yields about 14,000 incidents.

Unfortunately, the addresses of the incidents were not geocoded yet. Ideally, I would like you to know how to do this but am mindful about the hour or so required to get this done. So, here is the code. The geocodes (as far as they were returned successfully) are part of the data (as variables lat and lon).

library(ggmap)
library(tidyverse)
library(dplyr)
# Open "building_fires" file
fire_building <- read_csv("data/building_fires.csv")

FDNY Firehouse Listing

NYC Open Data also provides data on the location of all 218 firehouses in NYC. Relevant for our analysis are the following variables: FacilityName, Borough, Latitude, Longitude

firehouses <- read_csv("data/FDNY_Firehouse_Listing.csv") %>%
  dplyr::filter(!is.na(Latitude))

Note: 5 entries contain missing information, including on the spatial coordinates. We can exclude these for the exercise.

Tasks

1. Location of Severe Fires

Provide a leaflet map of the highest severity fires (i.e. subset to the highest category in HIGHEST_LEVEL_DESC) contained in the file buiding_fires.csv. Ignore locations that fall outside the five boroughs of New York City. Provide at least three pieces of information on the incident in a popup.

# subset to the highest alarm 
unique(fire_building$HIGHEST_LEVEL_DESC)
##  [1] "7 - Signal 7-5"                                   
##  [2] "2 - 2nd alarm"                                    
##  [3] "1 - More than initial alarm, less than Signal 7-5"
##  [4] "3 - 3rd alarm"                                    
##  [5] "5 - 5th alarm"                                    
##  [6] "4 - 4th alarm"                                    
##  [7] NA                                                 
##  [8] "0 - Initial alarm"                                
##  [9] "75 - All Hands Working"                           
## [10] "22 - Second Alarm"                                
## [11] "55 - Fifth Alarm"                                 
## [12] "11 - First Alarm"                                 
## [13] "33 - Third Alarm"                                 
## [14] "44 - Fourth Alarm"
highest_alarm <- fire_building %>%
  filter(HIGHEST_LEVEL_DESC=="7 - Signal 7-5"| HIGHEST_LEVEL_DESC=="75 - All Hands Working")
unique(highest_alarm$HIGHEST_LEVEL_DESC)
## [1] "7 - Signal 7-5"         "75 - All Hands Working"
head(highest_alarm)
## # A tibble: 6 x 27
##   IM_INCIDENT_KEY FIRE_BOX INCIDENT_TYPE_D… INCIDENT_DATE_T… ARRIVAL_DATE_TI…
##             <dbl> <chr>    <chr>            <chr>            <chr>           
## 1        55672965 2595     111 - Building … 01/01/2013 12:5… 01/01/2013 01:0…
## 2        55673299 2591     111 - Building … 01/01/2013 02:2… 01/01/2013 02:2…
## 3        55674170 1726     111 - Building … 01/01/2013 09:4… 01/01/2013 09:4…
## 4        55675138 0966     111 - Building … 01/01/2013 07:1… 01/01/2013 07:1…
## 5        55675164 7343     111 - Building … 01/01/2013 07:2… 01/01/2013 07:2…
## 6        55675781 5034     111 - Building … 01/02/2013 01:0… 01/02/2013 01:1…
## # … with 22 more variables: UNITS_ONSCENE <dbl>,
## #   LAST_UNIT_CLEARED_DATE_TIME <chr>, HIGHEST_LEVEL_DESC <chr>,
## #   TOTAL_INCIDENT_DURATION <dbl>, ACTION_TAKEN1_DESC <chr>,
## #   ACTION_TAKEN2_DESC <chr>, ACTION_TAKEN3_DESC <chr>,
## #   PROPERTY_USE_DESC <chr>, STREET_HIGHWAY <chr>, ZIP_CODE <dbl>,
## #   BOROUGH_DESC <chr>, FLOOR <chr>, CO_DETECTOR_PRESENT_DESC <chr>,
## #   FIRE_ORIGIN_BELOW_GRADE_FLAG <dbl>, STORY_FIRE_ORIGIN_COUNT <dbl>,
## #   FIRE_SPREAD_DESC <chr>, DETECTOR_PRESENCE_DESC <chr>,
## #   AES_PRESENCE_DESC <chr>, STANDPIPE_SYS_PRESENT_FLAG <dbl>, address <chr>,
## #   lon <dbl>, lat <dbl>
library(leaflet)
library(stringr)
highest_alarm_map <- leaflet(highest_alarm, options = leafletOptions(minZoom = 5, dragging = TRUE))%>%
  addProviderTiles(provider = "Esri")%>%
  setView( lat= 40.712742, lng=-74.013382, zoom = 10 ) %>%
  addCircleMarkers(lng = ~lon,lat = ~lat, radius = 1, popup = ~paste0("Date: ",str_extract(INCIDENT_DATE_TIME, pattern = "[0-9]+/[0-9]+/[0-9]+"), "<br/>", "Address: ", address, "<br/>", "Spread: ", str_sub(FIRE_SPREAD_DESC, 5,-1)))
highest_alarm_map

2. Layers and Clusters

a) Color by Type of Property

Start with the previous map. Now, distinguish the markers of the fire locations by PROPERTY_USE_DESC, i.e. what kind of property was affected. If there are too many categories, collapse some categories. Choose an appropriate coloring scheme to map the locations by type of affected property. Add a legend informing the user about the color scheme. Also make sure that the information about the type of affected property is now contained in the popup information. Show this map.

unique(highest_alarm$PROPERTY_USE_DESC)
##   [1] "579 - Motor vehicle or boat sales, services, repair"     
##   [2] "429 - Multifamily dwelling"                              
##   [3] "419 - 1 or 2 family dwelling"                            
##   [4] "700 - Manufacturing, processing"                         
##   [5] "161 - Restaurant or cafeteria"                           
##   [6] "881 - Parking garage, (detached residential garage)"     
##   [7] "564 - Laundry, dry cleaning"                             
##   [8] "400 - Residential, other"                                
##   [9] "559 - Recreational, hobby, home repair sales, pet store" 
##  [10] "500 - Mercantile, business, other"                       
##  [11] "549 - Specialty shop"                                    
##  [12] "331 - Hospital - medical or psychiatric"                 
##  [13] "131 - Church, mosque, synagogue, temple, chapel"         
##  [14] "210 - Schools, non-adult, other"                         
##  [15] "519 - Food and beverage sales, grocery store"            
##  [16] "449 - Hotel/motel, commercial"                           
##  [17] "173 - Bus station"                                       
##  [18] "891 - Warehouse"                                         
##  [19] "599 - Business office"                                   
##  [20] "580 - General retail, other"                             
##  [21] "311 - 24-hour care Nursing homes, 4 or more persons"     
##  [22] "UUU - Undetermined"                                      
##  [23] "888 - Fire station"                                      
##  [24] "900 - Outside or special property, other"                
##  [25] "965 - Vehicle parking area"                              
##  [26] "100 - Assembly, other"                                   
##  [27] "960 - Street, other"                                     
##  [28] "332 - Hospices"                                          
##  [29] "439 - Boarding/rooming house, residential hotels"        
##  [30] "123 - Stadium, arena"                                    
##  [31] "141 - Athletic/health club"                              
##  [32] "931 - Open land or field"                                
##  [33] "962 - Residential street, road or residential driveway"  
##  [34] "899 - Residential or self-storage units"                 
##  [35] "322 - Alcohol or substance abuse recovery center"        
##  [36] "170 - Passenger terminal, other"                         
##  [37] "557 - Personal service, including barber & beauty shops" 
##  [38] "880 - Vehicle storage, other"                            
##  [39] "460 - Dormitory-type residence, other"                   
##  [40] "800 - Storage, other"                                    
##  [41] "610 - Energy production plant, other"                    
##  [42] "926 - Outbuilding, protective shelter"                   
##  [43] "000 - Property Use, other"                               
##  [44] "511 - Convenience store"                                 
##  [45] "162 - Bar or nightclub"                                  
##  [46] "539 - Household goods, sales, repairs"                   
##  [47] "981 - Construction site"                                 
##  [48] "180 - Studio/theater, other"                             
##  [49] "183 - Movie theater"                                     
##  [50] "459 - Residential board and care"                        
##  [51] "241 - Adult education center, college classroom"         
##  [52] "152 - Museum"                                            
##  [53] "593 - Office:  veterinary or research"                   
##  [54] "130 - Places of worship, funeral parlors, other"         
##  [55] "160 - Eating, drinking places, other"                    
##  [56] "300 - Health care, detention, & correction, other"       
##  [57] "642 - Electrical distribution"                           
##  [58] "963 - Street or road in commercial area"                 
##  [59] "342 - Doctor, dentist or oral surgeon office"            
##  [60] "529 - Textile, wearing apparel sales"                    
##  [61] "321 - Mental retardation/development disability facility"
##  [62] "340 - Clinics, doctors offices, hemodialysis cntr, other"
##  [63] "882 - Parking garage, general vehicle"                   
##  [64] "150 - Public or government, other"                       
##  [65] "174 - Rapid transit station"                             
##  [66] "648 - Sanitation utility"                                
##  [67] "200 - Educational, other"                                
##  [68] "596 - Post office or mailing firms"                      
##  [69] "215 - High school/junior high school/middle school"      
##  [70] "110 - Fixed-use recreation places, other"                
##  [71] "592 - Bank"                                              
##  [72] "365 - Police station"                                    
##  [73] "571 - Service station, gas station"                      
##  [74] "182 - Auditorium, concert hall"                          
##  [75] "581 - Department or discount store"                      
##  [76] "808 - Outbuilding or shed"                               
##  [77] "839 - Refrigerated storage"                              
##  [78] "121 - Ballroom, gymnasium"                               
##  [79] "112 - Billiard center, pool hall"                        
##  [80] "NNN - None"                                              
##  [81] "142 - Clubhouse"                                         
##  [82] "140 - Clubs, other"                                      
##  [83] "569 - Professional supplies, services"                   
##  [84] "124 - Playground"                                        
##  [85] "363 - Reformatory, juvenile detention center"            
##  [86] "974 - Aircraft loading area"                             
##  [87] "464 - Barracks, dormitory"                               
##  [88] "635 - Computer center"                                   
##  [89] "211 - Preschool"                                         
##  [90] "186 - Film/movie production studio"                      
##  [91] "181 - Live performance theater"                          
##  [92] "134 - Funeral parlor"                                    
##  [93] "984 - Industrial plant yard - area"                      
##  [94] "629 - Laboratory or science lababoratory"                
##  [95] "144 - Casino, gambling clubs"                            
##  [96] "936 - Vacant lot"                                        
##  [97] "254 - Day care, in commercial property"                  
##  [98] "155 - Courthouse"                                        
##  [99] "807 - Outside material storage area"                     
## [100] "250 - Day care, other (Conversion only)"                 
## [101] "615 - Electric-generating plant"                         
## [102] "213 - Elementary school, including kindergarten"         
## [103] "143 - Yacht Club"                                        
## [104] "341 - Clinic, clinic-type infirmary"                     
## [105] "639 - Communications center"                             
## [106] "898 - Dock, marina, pier, wharf"                         
## [107] "952 - Railroad yard"
highest_alarm %>%
  group_by(PROPERTY_USE_DESC) %>%
  count()%>%
  arrange(desc (n))
## # A tibble: 107 x 2
## # Groups:   PROPERTY_USE_DESC [107]
##    PROPERTY_USE_DESC                                       n
##    <chr>                                               <int>
##  1 429 - Multifamily dwelling                           4383
##  2 419 - 1 or 2 family dwelling                         2974
##  3 500 - Mercantile, business, other                     404
##  4 161 - Restaurant or cafeteria                         154
##  5 519 - Food and beverage sales, grocery store          135
##  6 599 - Business office                                 123
##  7 564 - Laundry, dry cleaning                            71
##  8 881 - Parking garage, (detached residential garage)    69
##  9 400 - Residential, other                               68
## 10 449 - Hotel/motel, commercial                          49
## # … with 97 more rows
library(stringr)
highest_alarm['class']=list(str_sub(highest_alarm$PROPERTY_USE_DESC,1,1))

highest_alarm$class[highest_alarm$class == "1"] <- "Assembly"
highest_alarm$class[highest_alarm$class == "2"] <- "Educational"
highest_alarm$class[highest_alarm$class == "3"] <- "Healthcare, Detention and Correction"
highest_alarm$class[highest_alarm$class == "4"] <- "Residential"
highest_alarm$class[highest_alarm$class == "5"] <- "Mercantile and Business"
highest_alarm$class[highest_alarm$class == "6"] <- "Energy Production Plant"
highest_alarm$class[highest_alarm$class == "7"] <- "Manufacturing and Processing"
highest_alarm$class[highest_alarm$class == "8"] <- "Storage"
highest_alarm$class[highest_alarm$class %in% c("9","U","N","0")] <- "Other Property"

highest_alarm$class <- factor(highest_alarm$class, levels= c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"))

highest_alarm %>%
  group_by(class) %>%
  count() %>%
  arrange(desc(n))
## # A tibble: 9 x 2
## # Groups:   class [9]
##   class                                    n
##   <fct>                                <int>
## 1 Residential                           7508
## 2 Mercantile and Business                856
## 3 Assembly                               290
## 4 Storage                                149
## 5 Other Property                         140
## 6 Healthcare, Detention and Correction    65
## 7 Educational                             48
## 8 Manufacturing and Processing            40
## 9 Energy Production Plant                  8
library(RColorBrewer)

pal <- colorFactor(palette = "Set3", levels = c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"))

property_map <- leaflet(highest_alarm, options = leafletOptions(minZoom = 5, dragging = TRUE))%>%
  addProviderTiles(provider = "CartoDB")%>%
  setView( lat= 40.712742, lng=-74.013382, zoom = 10 ) %>%
  addCircleMarkers(lng = ~lon,lat = ~lat, radius = 1, color = ~pal(class), popup = ~paste0("<b>","Property Type: ", class,"</b>","<br/>","Date: ",str_extract(INCIDENT_DATE_TIME, pattern = "[0-9]+/[0-9]+/[0-9]+"), "<br/>", "Address: ", address, "<br/>", "Spread: ", str_sub(FIRE_SPREAD_DESC, 5,-1)))%>%
  addLegend(pal = pal, values = c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"), opacity = 0.8, title = "Property Affected",position = "topleft")

property_map
b) Cluster

Add marker clustering, so that zooming in will reveal the individual locations but the zoomed out map only shows the clusters. Show the map with clusters.

cluster_property_map <- leaflet(highest_alarm, options = leafletOptions(minZoom = 5, dragging = TRUE))%>%
  addProviderTiles(provider = "CartoDB")%>%
  setView( lat= 40.712742, lng=-74.013382, zoom = 10 ) %>%
  addCircleMarkers(lng = ~lon,lat = ~lat, radius = 1, color = ~pal(class), clusterOptions = markerClusterOptions(), popup = ~paste0("<b>","Property Type: ", class,"</b>","<br/>","Date: ",str_extract(INCIDENT_DATE_TIME, pattern = "[0-9]+/[0-9]+/[0-9]+"), "<br/>", "Address: ", address, "<br/>", "Spread: ", str_sub(FIRE_SPREAD_DESC, 5,-1)))%>%
  addLegend(pal = pal, values = c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"), opacity = 0.8, title = "Property Affected",position = "topleft")

cluster_property_map

3. Fire Houses

The second data file contains the locations of the 218 firehouses in New York City. Start with the non-clustered map (2b) and now adjust the size of the circle markers by severity (TOTAL_INCIDENT_DURATION or UNITS_ONSCENE seem plausible options). More severe incidents should have larger circles on the map. On the map, also add the locations of the fire houses. Add two layers (“Incidents”, “Firehouses”) that allow the user to select which information to show.

#highest_alarm$TOTAL_INCIDENT_DURATION)
summary(highest_alarm$TOTAL_INCIDENT_DURATION)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     870    4907    6803    7308    8918  428335       1
incidents_firehouse_map <-leaflet(highest_alarm, options = leafletOptions(minZoom = 5, dragging = TRUE))%>%
  addProviderTiles(provider = "CartoDB")%>%
  setView( lat= 40.712742, lng=-74.013382, zoom = 10 ) %>%
  addCircleMarkers(lng = ~lon,lat = ~lat, radius = ~TOTAL_INCIDENT_DURATION/5000 , color = ~pal(class), popup = ~paste0("<b>","Property Type: ", class,"</b>","<br/>","Date: ",str_extract(INCIDENT_DATE_TIME, pattern = "[0-9]+/[0-9]+/[0-9]+"), "<br/>", "Address: ", address, "<br/>", "Spread: ", str_sub(FIRE_SPREAD_DESC, 5,-1)), group = "Incidents")%>%
  addLegend(pal = pal, values = c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"), opacity = 0.8, title = "Property Affected",position = "topleft")%>%
  addMarkers(data = firehouses, lng = ~Longitude, lat = ~Latitude, group = "Firehouses")%>%
  addLayersControl(overlayGroups = c("Incidents", "Firehouses"))
incidents_firehouse_map

4. Distance from Firehouse and Response Time

We now want to investigate whether the distance of the incident from the nearest firehouse varies across the city.

a) Calculate Distance

For all incident locations (independent of severity), identify the nearest firehouse and calculate the distance between the firehouse and the incident location. Provide a scatter plot showing the time until the first engine arrived (the variables INCIDENT_DATE_TIME and ARRIVAL_DATE_TIME) will be helpful.

library(sp)
library(sf)
library(raster)
library(rgeos)
# transform firehouse into sf object
firehouse_sf <- st_as_sf(firehouses, coords = c("Longitude", "Latitude"), crs = 4326)
# build an empty list to store the nearest distance
nearest <-list()
# iterate the fire_building and transform them into sp objects 
# use st_distance to calculate the distances and find the nearest firehouse
for(i in 1:nrow(fire_building)){
  point_sf <- st_as_sf(fire_building[i,], coords = c("lon", "lat"), crs = 4326)
  nearest[i] <- min(st_distance(point_sf, firehouse_sf))
}
length(nearest)
## [1] 14200
any(is.na(nearest))
## [1] FALSE
# create a new column in fire_building about the nearest distance from firehouse
fire_building['distance'] <- unlist(nearest)
distance_and_time <- fire_building %>%
  dplyr::select(INCIDENT_DATE_TIME, ARRIVAL_DATE_TIME, PROPERTY_USE_DESC,HIGHEST_LEVEL_DESC,BOROUGH_DESC, lon, lat, distance)
head(distance_and_time)
## # A tibble: 6 x 8
##   INCIDENT_DATE_T… ARRIVAL_DATE_TI… PROPERTY_USE_DE… HIGHEST_LEVEL_D…
##   <chr>            <chr>            <chr>            <chr>           
## 1 01/01/2013 12:5… 01/01/2013 01:0… 579 - Motor veh… 7 - Signal 7-5  
## 2 01/01/2013 02:2… 01/01/2013 02:2… 429 - Multifami… 7 - Signal 7-5  
## 3 01/01/2013 06:2… 01/01/2013 06:2… 429 - Multifami… 2 - 2nd alarm   
## 4 01/01/2013 07:5… 01/01/2013 08:0… 429 - Multifami… 1 - More than i…
## 5 01/01/2013 09:4… 01/01/2013 09:4… 429 - Multifami… 7 - Signal 7-5  
## 6 01/01/2013 11:0… 01/01/2013 11:1… 429 - Multifami… 1 - More than i…
## # … with 4 more variables: BOROUGH_DESC <chr>, lon <dbl>, lat <dbl>,
## #   distance <dbl>
incident_time <- as.POSIXct(strptime(distance_and_time[['INCIDENT_DATE_TIME']], format = "%m/%d/%Y %H:%M:%S %p"))
arrival_time <- as.POSIXct(strptime(distance_and_time[['ARRIVAL_DATE_TIME']], format = "%m/%d/%Y %H:%M:%S %p"))
waiting_time <- arrival_time-incident_time
distance_and_time['waiting_time_secs'] <- as.numeric(waiting_time)
#There are some mistakes in original records like wrong AM/PM
#Transfrom those wrong records (negative numbers) by adding 12 hours back
for(i in 1:nrow(distance_and_time)){
  if (!is.na(distance_and_time[i,'waiting_time_secs'])&distance_and_time[i,'waiting_time_secs'] < 0) {
    distance_and_time[i,'waiting_time_secs'] <- distance_and_time[i,'waiting_time_secs']+12*60*60
  }
}
head(distance_and_time)
## # A tibble: 6 x 9
##   INCIDENT_DATE_T… ARRIVAL_DATE_TI… PROPERTY_USE_DE… HIGHEST_LEVEL_D…
##   <chr>            <chr>            <chr>            <chr>           
## 1 01/01/2013 12:5… 01/01/2013 01:0… 579 - Motor veh… 7 - Signal 7-5  
## 2 01/01/2013 02:2… 01/01/2013 02:2… 429 - Multifami… 7 - Signal 7-5  
## 3 01/01/2013 06:2… 01/01/2013 06:2… 429 - Multifami… 2 - 2nd alarm   
## 4 01/01/2013 07:5… 01/01/2013 08:0… 429 - Multifami… 1 - More than i…
## 5 01/01/2013 09:4… 01/01/2013 09:4… 429 - Multifami… 7 - Signal 7-5  
## 6 01/01/2013 11:0… 01/01/2013 11:1… 429 - Multifami… 1 - More than i…
## # … with 5 more variables: BOROUGH_DESC <chr>, lon <dbl>, lat <dbl>,
## #   distance <dbl>, waiting_time_secs <dbl>
summary(distance_and_time)
##  INCIDENT_DATE_TIME ARRIVAL_DATE_TIME  PROPERTY_USE_DESC  HIGHEST_LEVEL_DESC
##  Length:14200       Length:14200       Length:14200       Length:14200      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  BOROUGH_DESC            lon              lat           distance        
##  Length:14200       Min.   :-74.25   Min.   :40.50   Min.   :     7.75  
##  Class :character   1st Qu.:-73.97   1st Qu.:40.67   1st Qu.:   362.06  
##  Mode  :character   Median :-73.92   Median :40.72   Median :   561.04  
##                     Mean   :-73.92   Mean   :40.73   Mean   :   666.23  
##                     3rd Qu.:-73.87   3rd Qu.:40.80   3rd Qu.:   863.99  
##                     Max.   :-73.09   Max.   :41.60   Max.   :101812.17  
##                                                                         
##  waiting_time_secs
##  Min.   :   12.0  
##  1st Qu.:  172.0  
##  Median :  208.0  
##  Mean   :  421.1  
##  3rd Qu.:  250.0  
##  Max.   :86816.0  
##  NA's   :23
library(ggplot2)
library(ggthemes)
library(plotly)
df_plot <-distance_and_time %>%
  filter(!is.na(waiting_time_secs))%>%
  filter(distance < 5000) %>%
  filter(waiting_time_secs < 5000)
p <-ggplot(df_plot, aes(x = distance, y = waiting_time_secs))+
  geom_point(alpha = 0.5)+
  labs(x = "Distance From The Nearest Firehouse (m)", y = "Waiting Time For The First Engine (secs)")+
  theme_clean()
p

Now also visualize the patterns separately for severe and non-severe incidents (use HIGHEST_LEVEL_DESC but feel free to reduce the number of categories). What do you find?

fire_levels <- distance_and_time%>%
  filter(!is.na(HIGHEST_LEVEL_DESC))
fire_levels['level']=list(str_sub(fire_levels$HIGHEST_LEVEL_DESC,1,2))
fire_levels$level[fire_levels$level %in% c("7 ", "75")] <- "High Alarm"
fire_levels$level[fire_levels$level %in% c("5 ", "55", "4 ", "44", "3 ", "33")] <- "Medium Alarm"
fire_levels$level[fire_levels$level %in% c("2 ", "22", "11", "0 ")] <- "Low Alarm"
fire_levels$level[fire_levels$level %in% c("1 ")] <- "Undefined Alarm"
fire_levels$level <- factor(fire_levels$level, levels= c("Low Alarm","Medium Alarm","High Alarm","Undefined Alarm"))
fire_levels_plot <- fire_levels %>%
  filter(!is.na(waiting_time_secs))%>%
  filter(distance < 5000) %>%
  filter(waiting_time_secs < 1000)
ggplot(fire_levels_plot, aes(x=waiting_time_secs, y=distance, color = level))+
  geom_point(alpha=0.4)+
  facet_grid(~ level)+
  labs(x = "Waiting Time For The First Engine (secs)", y = "Distance From The Nearest Firehouse (m)")+
  theme_bw()

##### b) Map of Response Times

Provide a map visualization of response times. Investigate whether the type of property affected (PROPERTY_USE_DESC) or fire severity (HIGHEST_LEVEL_DESC) play a role here.

response_time <- fire_levels
response_time['class']=list(str_sub(response_time$PROPERTY_USE_DESC,1,1))

response_time$class[response_time$class == "1"] <- "Assembly"
response_time$class[response_time$class == "2"] <- "Educational"
response_time$class[response_time$class == "3"] <- "Healthcare, Detention and Correction"
response_time$class[response_time$class == "4"] <- "Residential"
response_time$class[response_time$class == "5"] <- "Mercantile and Business"
response_time$class[response_time$class == "6"] <- "Energy Production Plant"
response_time$class[response_time$class == "7"] <- "Manufacturing and Processing"
response_time$class[response_time$class == "8"] <- "Storage"
response_time$class[response_time$class %in% c("9","U","N","0")] <- "Other Property"

response_time$class <- factor(response_time$class, levels= c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"))
head(response_time)
## # A tibble: 6 x 11
##   INCIDENT_DATE_T… ARRIVAL_DATE_TI… PROPERTY_USE_DE… HIGHEST_LEVEL_D…
##   <chr>            <chr>            <chr>            <chr>           
## 1 01/01/2013 12:5… 01/01/2013 01:0… 579 - Motor veh… 7 - Signal 7-5  
## 2 01/01/2013 02:2… 01/01/2013 02:2… 429 - Multifami… 7 - Signal 7-5  
## 3 01/01/2013 06:2… 01/01/2013 06:2… 429 - Multifami… 2 - 2nd alarm   
## 4 01/01/2013 07:5… 01/01/2013 08:0… 429 - Multifami… 1 - More than i…
## 5 01/01/2013 09:4… 01/01/2013 09:4… 429 - Multifami… 7 - Signal 7-5  
## 6 01/01/2013 11:0… 01/01/2013 11:1… 429 - Multifami… 1 - More than i…
## # … with 7 more variables: BOROUGH_DESC <chr>, lon <dbl>, lat <dbl>,
## #   distance <dbl>, waiting_time_secs <dbl>, level <fct>, class <fct>
low <- filter(response_time, level == "Low Alarm")
medium <- filter(response_time, level == "Medium Alarm")
high <- filter(response_time, level == "High Alarm")
pal_level <- colorFactor(palette = c("#f03b20", "#feb24c", "#ffeda0"), 
                   levels = c("High Alarm","Medium Alarm","Low Alarm"))
alarm_map <- leaflet(options = leafletOptions(minZoom = 8, dragging = TRUE)) %>%
  addProviderTiles("CartoDB.DarkMatter",options = providerTileOptions(attribution = ""))%>%
  addCircleMarkers(data = low, lng = ~lon, lat = ~lat, radius = ~waiting_time_secs/5000, fillOpacity=0.3,
                         color = ~pal_level(level),  group = "Low Alarm", popup=~paste("Alarm Level: ",level,
                           "<br>Response Time: ", waiting_time_secs," seconds", "<br>Property Type: ", class)) %>% 
  addCircleMarkers(data = medium, lng = ~lon, lat = ~lat, radius = ~waiting_time_secs/5000, fillOpacity=0.3,
                         color = ~pal_level(level),  group = "Medium Alarm", popup=~paste("Alarm Level: ",level,
                           "<br>Response Time: ", waiting_time_secs, " seconds", "<br>Property Type: ", class)) %>%  
  addCircleMarkers(data = high, lng = ~lon, lat = ~lat, radius = ~waiting_time_secs/5000, fillOpacity=0.3,
                         color = ~pal_level(level),  group = "High Alarm", popup=~paste("Alarm Level: ",level,
                           "<br>Response Time: ", waiting_time_secs, " seconds", "<br>Property Type: ", class)) %>%
        setView(lat= 40.712742, lng=-74.013382, zoom = 10) %>%
        addLegend(pal = pal_level, values = c("High Alarm","Medium Alarm", "Low Alarm"), opacity = 0.7, title = "Alarm Level",
                                       position = "topleft")%>%
        addLayersControl(overlayGroups = c("High Alarm","Medium Alarm","Low Alarm"))
alarm_map
long_response <- subset(response_time, waiting_time_secs > 500)
fireIcons <- icons(
  iconUrl = "data/redflame.png",
  iconWidth = 15, iconHeight = 15,
  iconAnchorX = 7.5, iconAnchorY = 8.5
  )
pal_class <- colorFactor(palette = "Tableau10", levels = c("Assembly","Educational","Healthcare, Detention and Correction", "Residential", "Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"))

leaflet(options = leafletOptions(minZoom = 5, dragging = TRUE))%>%
addProviderTiles(provider = "CartoDB") %>%
addCircleMarkers(data = response_time, lng = ~lon, lat = ~lat, radius = ~waiting_time_secs/5000, color = ~pal_class(class), 
                 fillOpacity=0.3, popup = ~paste("Property Type: ", class, "<br>Response Time: ", waiting_time_secs," seconds"))%>%
  addMarkers(data= long_response,icon = fireIcons, 
             popup = ~paste("Property Type: ", class, "<br>Response Time: ", waiting_time_secs," seconds"))%>%
  setView( lat= 40.712742, lng=-74.013382, zoom = 10 )

According to the plot above, those big circles and fire flame icons are the incidents with a long response time.Clicking the popup, we can see most of them are residential properties. Then we remove those “Residential” records and find some incidents happened in businesses and assembly also had a long response time.

fireIcons2 <- icons(
  iconUrl = "data/flame.png",
  iconWidth = 15, iconHeight = 15,
  iconAnchorX = 7.5, iconAnchorY = 8.5
  )

pal_property <- colorFactor(palette = "Spectral", levels = c("Assembly","Educational","Healthcare, Detention and Correction","Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"))

leaflet(options = leafletOptions(minZoom = 5, dragging = TRUE))%>%
addProviderTiles("Esri.WorldImagery", options = providerTileOptions(attribution = "")) %>%
addCircleMarkers(data = subset(response_time, response_time$class != "Residential"), lng = ~lon, lat = ~lat, radius = ~waiting_time_secs/5000, color = ~pal_property(class), fillOpacity=0.7, popup = ~paste("Property Type: ", class, "<br>Response Time: ", waiting_time_secs," seconds"))%>%
  addMarkers(data= subset(long_response, long_response$class != "Residential"),icon = fireIcons2, 
             popup = ~paste("Property Type: ", class, "<br>Response Time: ", waiting_time_secs," seconds"))%>%
  addLegend(pal = pal_property, values = c("Assembly","Educational","Healthcare, Detention and Correction","Mercantile and Business", "Energy Production Plant", "Manufacturing and Processing", "Storage", "Other Property"), opacity = 0.7, title = "Property Affected",position = "topleft")%>%
  setView( lat= 40.712742, lng=-74.013382, zoom = 10 )

Show a faceted choropleth map indicating how response times have developed over the years. What do you find?

response_borough<-distance_and_time %>%
  select(INCIDENT_DATE_TIME,BOROUGH_DESC, waiting_time_secs)
response_borough['borough'] = list(str_sub(response_borough$BOROUGH_DESC,1,1))
response_borough$borough[response_borough$borough == "1"] <- "Manhattan"
response_borough$borough[response_borough$borough == "2"] <- "Bronx"
response_borough$borough[response_borough$borough == "3"] <- "Staten Island"
response_borough$borough[response_borough$borough == "4"] <- "Brooklyn"
response_borough$borough[response_borough$borough == "5"] <- "Queens"
response_borough$borough <- factor(response_borough$borough, levels= c("Manhattan","Bronx","Staten Island","Brooklyn","Queens"))
response_borough['year'] = list(str_sub(response_borough$INCIDENT_DATE_TIME,7,10))
response_borough$year <- factor(response_borough$year, levels= c("2013","2014","2015","2016","2017","2018"))
head(response_borough)
## # A tibble: 6 x 5
##   INCIDENT_DATE_TIME     BOROUGH_DESC  waiting_time_secs borough   year 
##   <chr>                  <chr>                     <dbl> <fct>     <fct>
## 1 01/01/2013 12:58:10 AM 4 - Brooklyn                160 Brooklyn  2013 
## 2 01/01/2013 02:22:56 AM 2 - Bronx                   147 Bronx     2013 
## 3 01/01/2013 06:20:49 AM 1 - Manhattan               324 Manhattan 2013 
## 4 01/01/2013 07:59:48 AM 5 - Queens                  225 Queens    2013 
## 5 01/01/2013 09:47:27 AM 4 - Brooklyn                118 Brooklyn  2013 
## 6 01/01/2013 11:09:16 AM 1 - Manhattan               155 Manhattan 2013
subset(response_borough, response_borough$borough == "Queens"&response_borough$year =="2013")
## # A tibble: 662 x 5
##    INCIDENT_DATE_TIME     BOROUGH_DESC waiting_time_secs borough year 
##    <chr>                  <chr>                    <dbl> <fct>   <fct>
##  1 01/01/2013 07:59:48 AM 5 - Queens                 225 Queens  2013 
##  2 01/01/2013 07:25:38 PM 5 - Queens                 156 Queens  2013 
##  3 01/02/2013 01:07:58 AM 5 - Queens                 197 Queens  2013 
##  4 01/03/2013 03:44:49 AM 5 - Queens                 244 Queens  2013 
##  5 01/04/2013 04:05:35 PM 5 - Queens                 179 Queens  2013 
##  6 01/04/2013 11:41:57 PM 5 - Queens                 235 Queens  2013 
##  7 01/05/2013 04:18:01 PM 5 - Queens                 259 Queens  2013 
##  8 01/05/2013 05:14:56 PM 5 - Queens                 184 Queens  2013 
##  9 01/06/2013 01:56:34 PM 5 - Queens                 165 Queens  2013 
## 10 01/06/2013 07:42:38 PM 5 - Queens                 184 Queens  2013 
## # … with 652 more rows
average_response_time <-response_borough %>%
  filter(!is.na(waiting_time_secs))%>%
  group_by(borough, year) %>%
  summarise(mean_response_time = round(mean(waiting_time_secs),2))
average_response_time
## # A tibble: 30 x 3
## # Groups:   borough [5]
##    borough   year  mean_response_time
##    <fct>     <fct>              <dbl>
##  1 Manhattan 2013                388.
##  2 Manhattan 2014                380.
##  3 Manhattan 2015                398.
##  4 Manhattan 2016                233.
##  5 Manhattan 2017                639.
##  6 Manhattan 2018                236.
##  7 Bronx     2013                498.
##  8 Bronx     2014                920.
##  9 Bronx     2015                397.
## 10 Bronx     2016                411.
## # … with 20 more rows
library(rgdal)
borough <- readOGR("data/borough_boundaries.geojson", verbose=FALSE)
borough@data
##   boro_code     boro_name    shape_area    shape_leng
## 0         2         Bronx 1186612478.22 462958.187332
## 1         5 Staten Island 1623756421.89 325960.628294
## 2         3      Brooklyn 1937593022.23 738745.840717
## 3         4        Queens 3045878293.26 904188.424111
## 4         1     Manhattan 636602658.764 361212.476577
shp_response <- borough@data %>%
  right_join(average_response_time, by = c("boro_name"= "borough"))
shp_response
##    boro_code     boro_name    shape_area    shape_leng year mean_response_time
## 1          1     Manhattan 636602658.764 361212.476577 2013             387.76
## 2          1     Manhattan 636602658.764 361212.476577 2014             380.41
## 3          1     Manhattan 636602658.764 361212.476577 2015             398.38
## 4          1     Manhattan 636602658.764 361212.476577 2016             233.43
## 5          1     Manhattan 636602658.764 361212.476577 2017             638.94
## 6          1     Manhattan 636602658.764 361212.476577 2018             236.01
## 7          2         Bronx 1186612478.22 462958.187332 2013             497.75
## 8          2         Bronx 1186612478.22 462958.187332 2014             920.34
## 9          2         Bronx 1186612478.22 462958.187332 2015             396.86
## 10         2         Bronx 1186612478.22 462958.187332 2016             410.73
## 11         2         Bronx 1186612478.22 462958.187332 2017             228.81
## 12         2         Bronx 1186612478.22 462958.187332 2018             244.08
## 13         5 Staten Island 1623756421.89 325960.628294 2013             231.58
## 14         5 Staten Island 1623756421.89 325960.628294 2014             234.76
## 15         5 Staten Island 1623756421.89 325960.628294 2015             217.37
## 16         5 Staten Island 1623756421.89 325960.628294 2016             226.07
## 17         5 Staten Island 1623756421.89 325960.628294 2017             223.32
## 18         5 Staten Island 1623756421.89 325960.628294 2018             246.95
## 19         3      Brooklyn 1937593022.23 738745.840717 2013             498.28
## 20         3      Brooklyn 1937593022.23 738745.840717 2014             297.12
## 21         3      Brooklyn 1937593022.23 738745.840717 2015             188.05
## 22         3      Brooklyn 1937593022.23 738745.840717 2016             537.03
## 23         3      Brooklyn 1937593022.23 738745.840717 2017             427.31
## 24         3      Brooklyn 1937593022.23 738745.840717 2018            1007.03
## 25         4        Queens 3045878293.26 904188.424111 2013             351.84
## 26         4        Queens 3045878293.26 904188.424111 2014             492.58
## 27         4        Queens 3045878293.26 904188.424111 2015             482.59
## 28         4        Queens 3045878293.26 904188.424111 2016             359.71
## 29         4        Queens 3045878293.26 904188.424111 2017             535.73
## 30         4        Queens 3045878293.26 904188.424111 2018             248.77
borough@data <-shp_response %>%
  filter(year =="2013")
summary(borough$mean_response_time)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   231.6   351.8   387.8   393.4   497.8   498.3
pal_response <- colorNumeric("PuOr", domain = borough$mean_response_time)
map2013 <-borough %>%
  leaflet()%>%
  addProviderTiles("CartoDB")%>%
  addPolygons(weight = 1, color = ~pal_response(mean_response_time), fillOpacity = 1,
              label = ~paste0("Mean Response Time: ", mean_response_time, "seconds"),
              highlightOptions = highlightOptions(weight = 5, color = "white", bringToFront = TRUE))%>%
  addLegend(pal = pal_response, values = ~ mean_response_time, title = "2013",
            position = "topleft", opacity=0.7)
borough@data <-shp_response %>%
  filter(year =="2014")
summary(borough$mean_response_time)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   234.8   297.1   380.4   465.0   492.6   920.3
pal_response <- colorNumeric("PuOr", domain = borough$mean_response_time)
map2014 <-borough %>%
  leaflet()%>%
  addProviderTiles("CartoDB")%>%
  addPolygons(weight = 1, color = ~pal_response(mean_response_time), fillOpacity = 1,
              label = ~paste0("Mean Response Time: ", mean_response_time, "seconds"),
              highlightOptions = highlightOptions(weight = 5, color = "white", bringToFront = TRUE))%>%
  addLegend(pal = pal_response, values = ~ mean_response_time, title = "2014",
            position = "topleft", opacity=0.7)
borough@data <-shp_response %>%
  filter(year =="2015")
summary(borough$mean_response_time)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   188.1   217.4   396.9   336.6   398.4   482.6
pal_response <- colorNumeric("PuOr", domain = borough$mean_response_time)
map2015 <-borough %>%
  leaflet()%>%
  addProviderTiles("CartoDB")%>%
  addPolygons(weight = 1, color = ~pal_response(mean_response_time), fillOpacity = 1,
              label = ~paste0("Mean Response Time: ", mean_response_time, "seconds"),
              highlightOptions = highlightOptions(weight = 5, color = "white", bringToFront = TRUE))%>%
  addLegend(pal = pal_response, values = ~ mean_response_time, title = "2013",
            position = "topleft", opacity=0.7)
borough@data <-shp_response %>%
  filter(year =="2016")
summary(borough$mean_response_time)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   226.1   233.4   359.7   353.4   410.7   537.0
pal_response <- colorNumeric("PuOr", domain = borough$mean_response_time)
map2016 <-borough %>%
  leaflet()%>%
  addProviderTiles("CartoDB")%>%
  addPolygons(weight = 1, color = ~pal_response(mean_response_time), fillOpacity = 1,
              label = ~paste0("Mean Response Time: ", mean_response_time, "seconds"),
              highlightOptions = highlightOptions(weight = 5, color = "white", bringToFront = TRUE))%>%
  addLegend(pal = pal_response, values = ~ mean_response_time, title = "2016",
            position = "topleft", opacity=0.7)
borough@data <-shp_response %>%
  filter(year =="2017")
summary(borough$mean_response_time)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   223.3   228.8   427.3   410.8   535.7   638.9
pal_response <- colorNumeric("PuOr", domain = borough$mean_response_time)
map2017 <-borough %>%
  leaflet()%>%
  addProviderTiles("CartoDB")%>%
  addPolygons(weight = 1, color = ~pal_response(mean_response_time), fillOpacity = 1,
              label = ~paste0("Mean Response Time: ", mean_response_time, "seconds"),
              highlightOptions = highlightOptions(weight = 5, color = "white", bringToFront = TRUE))%>%
  addLegend(pal = pal_response, values = ~ mean_response_time, title = "2017",
            position = "topleft", opacity=0.7)
borough@data <-shp_response %>%
  filter(year =="2018")
summary(borough$mean_response_time)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   236.0   244.1   246.9   396.6   248.8  1007.0
pal_response <- colorNumeric("PuOr", domain = borough$mean_response_time)
map2018 <-borough %>%
  leaflet()%>%
  addProviderTiles("CartoDB")%>%
  addPolygons(weight = 1, color = ~pal_response(mean_response_time), fillOpacity = 1,
              label = ~paste0("Mean Response Time: ", mean_response_time, "seconds"),
              highlightOptions = highlightOptions(weight = 5, color = "white", bringToFront = TRUE))%>%
  addLegend(pal = pal_response, values = ~ mean_response_time, title = "2018",
            position = "topleft", opacity=0.7)
library(mapview)
sync(map2013, map2014, map2015, map2016, map2017,map2018, ncol = 3, sync = "all")

Submission

Please follow the instructions to submit your homework. The homework is due on Wednesday, March 25.

Please stay honest!

If you do come across something online that provides part of the analysis / code etc., please no wholesale copying of other ideas. We are trying to evaluate your abilities to visualize data, not the ability to do internet searches. Also, this is an individually assigned exercise – please keep your solution to yourself.